skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Goldstein, Tom"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available February 7, 2026
  2. The recent emergence of powerful Vision-Language models (VLMs) has significantly improved image captioning. Some of these models are extended to caption videos as well. However, their capabilities to understand complex scenes are limited, and the descriptions they provide for scenes tend to be overly verbose and focused on the superficial appearance of objects. Scene descriptions, especially in movies, require a deeper contextual understanding unlike general-purpose video captioning. To address this challenge, we propose a model, CALVIN, a specialized video LLM that leverages previous movie context to generate fully “contextual” scene descriptions. To achieve this, we train our model on a suite of tasks that integrate both image-based question-answering and video captioning within a unified framework, before applying instruction tuning to refine the model’s ability to provide scene captions. Lastly, we observe that our model responds well to prompt engineering and few-shot in-context learning techniques, enabling the user to adapt it to any new movie with very little additional annotation. 
    more » « less
  3. Free, publicly-accessible full text available June 10, 2026
  4. Free, publicly-accessible full text available June 10, 2026
  5. In this paper, Kirchenbauer et. al. use a novel watermarking technology to watermark the output of large language models (LLMs) like ChatGP, which is often in the form of AI-generated text, and mitigate the harms associated with the increasing usage of these technologies. They note some of the capabilities of these LLM models as writing documents, creating executable code, and answering questions, often with human-like capabilities. In addition, they list some of the harms as social engineering and election manipulation campaigns that exploit automated bots on social media platforms, creation of fake news and web content, and use of AI systems for cheating onacademic writing and coding assignments. As for implications for policy makers, this technology can be utilized as a means to regulate and oversee the use of these LLMs on all public and social fronts where their AI-generated text output could pose a potential harm, such as those listed by the authors. (Methods and Metrics, watermarking LLM output) 
    more » « less